fix(cli): cp-style sandbox download and workspace-boundary check#1353
Open
laitingsheng wants to merge 6 commits into
Open
fix(cli): cp-style sandbox download and workspace-boundary check#1353laitingsheng wants to merge 6 commits into
laitingsheng wants to merge 6 commits into
Conversation
`openshell sandbox download` produced a directory at the host destination instead of writing the source bytes as a regular file, and did not refuse sandbox-side paths that resolved outside `/sandbox` (only `creds.json` was rejected, by filesystem policy rather than by an explicit check). Mirrors the cp-style design from the upload-side fixes (#667 / #694 and #885 / #952): destination resolution now respects trailing slashes and existing-directory destinations, single-file sources land as regular files via a staged extraction and atomic rename, and the function delegates the SSH tar stream to a shared helper. Adds an explicit lexical workspace-boundary check on the sandbox-side source — required on the download side because filesystem policy alone is not a substitute for an explicit check when crossing the trust boundary in the leak direction. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…ember The lexical check in `validate_sandbox_source_path` clears any path whose components stay under `/sandbox` after `.` and `..` are folded out, but it cannot see symlinks. A workspace-relative symlink such as `/sandbox/etc-link -> /etc` lets `openshell sandbox download <name> /sandbox/etc-link/passwd /tmp/passwd` slip through, then `tar -C /sandbox/etc-link passwd` follows the link on the sandbox side and streams `/etc/passwd` back to the host. Add `canonicalize_sandbox_source_path` which runs `realpath -e --` over SSH and rejects the result unless it still equals `/sandbox` or starts with `/sandbox/`. The lexical guard stays as a cheap fail-fast for obviously-bad inputs, and the rest of the download flow now uses the canonical path everywhere a tar invocation or probe might dereference a symlink. While editing the single-file branch, also add the `--` separator before the basename in `tar cf - -C <parent> -- <name>`. Without it, a sandbox file whose basename starts with `-` (e.g. `--checkpoint-action=...`) is parsed by GNU tar as an option rather than a member, which at best fails the wrap and at worst triggers tar option behaviour. The directory branch passes literal `.` and is unaffected. Cover both fixes with pure-function tests: `is_under_sandbox_workspace` gets accept/reject pairs (the same predicate the post-realpath validator uses) and the new `build_single_file_tar_cmd` helper asserts the `--` separator and shell escaping survive any future refactor. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
`tempfile` was promoted from `[dev-dependencies]` to `[dependencies]` once the cp-style single-file download path started using `TempDir::new_in` from production code. The original `[dev-dependencies]` line stayed behind by accident — entries under `[dependencies]` are already available to test compilation, so the second declaration is dead weight. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Adds four end-to-end tests for `openshell sandbox download` so the happy path and both Codex regressions have explicit coverage: * `sandbox_download_file_only` — seeds a single file via the new `SandboxGuard::exec` harness helper and pulls it back, exercising `sandbox_sync_down_file` without an upload phase. * `sandbox_download_directory_only` — seeds a small directory tree and pulls it recursively, exercising `sandbox_sync_down_directory`. * `sandbox_download_rejects_symlinks_pointing_outside_workspace` — regression for the high-severity finding. Plants three escape shapes (`/sandbox/etc-link -> /etc`, `/sandbox/passwd-link -> /etc/passwd`, and a path with a symlinked component `/sandbox/etc-link/passwd`) and asserts all three are refused with a workspace-boundary error before any tar streams. Also checks the host destination stays empty so a silent leak would surface. * `sandbox_download_handles_dash_leading_basename` — regression for the medium-severity finding. Seeds a file named `--checkpoint-action=evil` and verifies it downloads with byte-for-byte contents, proving the `--` separator stops tar parsing the basename as an option. The harness gains a single new method, `SandboxGuard::exec`, that wraps `openshell sandbox exec --name <name> --no-tty -- <argv>`. Existing tests preferred the upload path to populate sandbox state, but the regression cases need to plant filesystem state the upload flow cannot produce (symlinks, files whose basenames clap would otherwise reject). Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
`openshell sandbox download` now refuses any source path that resolves outside `/sandbox` (lexically or through a symlink), and the single-file branch follows `cp`-style placement rules. Document both so users hitting the workspace-boundary error or the "land at exact path vs land inside dir" decision can self-serve. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
🌿 Preview your docs: https://nvidia-preview-pr-1353.docs.buildwithfern.com/openshell |
|
Label |
`sandbox_download_rejects_symlinks_pointing_outside_workspace` matched the substring `"outside the sandbox workspace"` against the CLI's error output. The check is correct in spirit but fragile in practice: miette wraps long single-line errors across multiple lines at terminal width and inserts a `│ ` continuation marker, so the substring breaks across the wrap and the test fails even though the canonicalisation step correctly refused the source. Split the match into three short markers — `resolves to`, `outside the`, `sandbox workspace` — each guaranteed to fit on one wrapped line. The combined check is stricter than the original (it now requires evidence that the remote `realpath -e` step actually ran, not just the boundary refusal) without inheriting the wrap fragility. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openshell sandbox downloadhad three independent shortcomings: it wrote a directory at the host destination instead of the source bytes as a regular file, it refused only paths that escaped/sandboxlexically (a planted symlink such as/sandbox/etc-link -> /etcslipped through and lettar -Cfollow the link off-workspace), and the single-file tar wrapper omitted the--separator so a basename starting with--was parsed as an option. This PR brings the download path to parity with the upload-side cp-style fix shipped in #694 (#667) and #952 (#885), closes the symlink-escape gap surfaced by Codex review, and tightens the tar invocation.Related Issue
Fixes NVIDIA/NemoClaw#3345
Changes
sandbox_sync_downvalidates the sandbox-side source path with a lexical workspace-boundary check before issuing any SSH command. Paths that resolve outside/sandbox(e.g./etc/passwd,/sandbox/../etc/passwd,/sandboxed/secrets) are refused with a clear error.resolve_sandbox_source_path, which runsrealpath -e --on the sandbox via the existing SSH command runner and re-validates the canonical result against the workspace boundary. This closes the high-severity symlink-escape gap:/sandbox/etc-link(or/sandbox/etc-link/passwdvia a planted symlink) is now refused before any tar streams.cp-style destination semantics:dest(ordestalready exists as a directory) → file lands at<dest>/<source-basename>.destis taken as the exact file path.dest(with an early refusal whendestexists as a non-directory).tar cf - -C <parent> -- <basename>so a sandbox-side basename starting with--(e.g.--checkpoint-action=evil) is treated as a member to archive rather than a tar option.stream_sandbox_tarhelper, mirroring the upload-sidessh_tar_uploadrefactor from fix(cli): sandbox upload overwrites files instead of creating directories #694.sandbox_sync_downtakesdest: &strso trailing slashes survive (previously the value was converted to&Pathat the dispatch site, which dropped them).tempfilefrom[dev-dependencies]to[dependencies]for the new single-file staging directory, and drops the now-redundant[dev-dependencies]entry.Testing
mise run pre-commitpassesChecklist